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ABSTRACT 

We show that Neural Nets can be useful for top analysis at Tevatron. The main 
features of tt and background events on a mixed sample are projected in a single output, 
which controls the efficiency and purity of the ti signal. 
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The announced discovery of the top quark by CDF at Tevatron has originated a 
big excitation in the scientific community Although the statistics is too limited Q to establish 
the existence of the top quark, it is however natural to interpret the excess of events as 
ti. The experimental situation will certainly improve in next months and top will hopefully 
be confirmed. From the theoretical point of view, the consistency of the Standard Model 
demands top to be the partner of the bottom quark, ensuring the absence of flavor changing 
neutral currents ||. The CDF value of the top mass, m t = 174 ± lO^H GeV is consistent 
with recent theoretical studies on radiative corrections combined with precision measurements 
of the Z boson mass and the strong coupling constant at LEP leading to m t = 165lj;|l \l GeV 

The dominant top production mechanism at Tevatron is qq — > ti, followed by gg — > ti. 
Once produced, the top decays into bW, with the subsequent W — > lv, qq' decay, in the 
detector. There are therefore three possible final states for the ti signal which , on increasing 
branching ratios, are: 

1. Two charged leptons, missing energy and two jets 

2. One charged lepton, missing energy and four jets 

3. Six jets. 

They need different strategies for top searches and different backgrounds have to be 
considered respectively. The first channel suffers from a small branching ratio and the presence 
of two undetected neutrinos that makes top reconstruction unfeasible. It has been analyzed 
in terms of the correlations among the charged leptons || and, recently, it has been suggested 
to be separable from its possible backgrounds ||. The most investigated channel so far is 
the one containing one charged lepton [7]] . It has a sizeable branching ratio with a moderate 
background. Still the neutrino escapes detection and hence the event can not be completely 
reconstructed. The third channel, six final jets, is the most likely and allows full top recon- 
struction but at expenses of a huge QCD background. Recently, it has been pointed out that 
tagging of a b-quark can help to obtain acceptable signal to background ratios for m t < 180 
GeV!. 

All mentioned channels need some specific experimental cuts for detecting jets and/or 
hard leptons as well as for their isolation. This, together with detector performance, implies 

^CDF has reported on 12 events, with 6 events for the estimated background, with a 0.26% probability of 
observing background fluctuation. DO instead has not a clear signal of the top quark B . 
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a sensible reduction on the number of possible ti candidates, and demands a good efficiency 
for discerning real from fake ti background events. We propose to use Neural Nets (NNs) for 
the analysis of experimental data trying to maximize the signal to background ratio without 
significant loses in statistics, in particular to top analysis at Tevatron. NNs are by now well 
known for its ability in classifying among different distributions and are being used for this 
purpose in several high energy applications ||. Some examples are Higgs search at LHC 
|T0f , b and r analysis [jlTJ, quark and gluon jets analysis ||12|| , determination of Z to heavy 
quarks branching ratios |I3| , or bottom jet recognition |L4|. It has been shown that NNs give, 
after proper training, the probability that a given event belongs to some class [ISA providing 
therefore a useful tool for classifying decisions. In fact, we are not interested in a deep and 
exhaustive analysis but rather in the possibilities that a NN can offer us for enlarging the 
signal to background ratio. For we restrict ourselves at the parton level, without considering 
hadronization, detector acceptance, resolution effects, efficiencies, etc. in order to illustrate 
the potential effects of the NN in front of the classical analysis in terms of cuts on a given set 
of variables. 

We focused our analysis to the one charged lepton channel 

pp^ti^ Ivjjjj, (1) 



with I = e ,pL , using the exact tree level amplitudes with spin correlations |L6|. The main 
background to this process is |L7 

pp -> Wjjjj -> Ivjjjj (2) 

together with 

pp -> WW{WZ)jj -> Ivjjjj (3) 

which is an order of magnitude smaller fl8| . We have only considered the first mechanism 
and have used VECBOSf] |0| for its evaluation. 

We have taken m t = 174 GeV and have normalized the total ti cross section at 
Tevatron to 5.1 pb, value that takes into account 0(a^) corrections and resummation of 
leading soft gluon corrections to all orders in perturbation theory | |20|| . CDF measures a ti 
cross section of 13.9±4;g pb which is a factor around 2.5 bigger than the theoretical value we 
have used. Notice that using the CDF value, the signal to background ratio would increase by 
the same factor. We have used the HMRS set 1 structure functions [[HJ] at the scale Q = mt 
(Q =< p t >) for the top signal (background). We generated events satisfying reasonable 
acceptance cuts for the jets, charged lepton and missing transverse momentum, 

PtiPtJt > 20 GeV > ( 4 ) 



+We thank W. Giele for making the VECBOS code available to us. 
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and the jets and lepton pseudorapidities 



\rf\M\<2, (5) 

and requiring jet and lepton isolation, 

AR jh AR j:j > 0.7, (6) 



where AR = J (Ar/) 2 + (A0) 2 is the distance in the lego plot. These cuts are intended to 
simulate the experimental cuts needed to detect jets and hard leptons inside the detector and 
to select good candidates for top production (from now on these cuts will be referred to as 
acceptance cuts). The cross section after the acceptance cuts is 0.35 pb (1.2 pb) for ti signal 



(background) in good agreement with Ref.f22|. We generated 4000 tt and 4000 background 



events. The total number of events is essentially limited by the time needed to generate a 
statistically significant sample for the background. (More efficient generation techniques have 
been recently proposed || wich could hopefully circumvent this problem). 

Notice that the acceptance cuts have to be supplemented either with additional cuts 
or any other criteria, as a NN for instance, on some kinematical variables in order to assign 
a single event as signal or background, leading to a reduction of the ti and background event 



samples, (b tagging, for example, reduces the signal by a factor of order 0.3.[|23i) 
We have considered six kinematical variables in our analysis, 



• i) p T l , the transverse momentum of the leptonically decaying 
W. 

• ii) Et, the total transverse energy. 

• Hi) rn\Vjji the invariant mass of the hadronically decaying 
W. 

• iv) m t , the reconstructed top mass. 

• v ) S, sphericity. 

• vi) A, aplanarity. 



Variables i and ii are completely defined when assigning the missing transverse mo- 
mentum to the undetected neutrino. The third variable requires pairing of two jets with 
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invariant mass close to the W mass. Variables iv, v and vi need the knowledge of the longitu- 
dinal momentum of the neutrino, which is not measured. It can however be inferred assuming 
that the lu pair comes from an on-shell W. This leads to a two-fold ambiguity which can 



be resolved to some extend by requiring tt reconstruction in the lines suggested by Ref.||22| 
to which we refer for details. The sphericity and aplanarity, computed for the lepton plus 
neutrino plus 4-jet momenta, take into account the topology of the events expecting larger 
values from the signal than the background distributions. 

The usual strategy for classifying signal or background type events is by applying 
different cuts on the kinematical variables considered, the six above mentioned in our case. 
These cuts are usualy given by simple expressions (for instance: varl > cutl and var2 < cut2), 
so that, the different regions are separated by hyperplanes in the variable space (from now 
on these cuts will be referred to as kinematical cuts). Denoting by T (B) the number of top 
signal (background) events passing our selection criteria, and T t the total number of ti events 
selected after the acceptance cuts, Eqs.(4-6), one would like to find the best combination 
of cuts on the kinematical variables such to maximize the efficiency 77 = T/T t or the purity 
P = T/(T + B) or both simultaneously. In the latest case, a method could be to maximize 
the statistical significance of the filtered subsample, S s = T / y/B, criterium that can be used 
to enhance a new signal from its expected background. In any case, this gives rise to subtle 
fine tuning on the cuts to reach the maximization that can become a hard issue for larger 
number of kinematical variables considered. 

We are interested in the separation of signal and background using a layered feedfor- 
ward NN which, as we will show, avoids fine tuning in a multi variable space. A feedforward 
NN consists of several layer of units called neurons. Between the layer we can distinguish one 
input layer where the information comes in, one or several hidden layers where the information 
is processed, and one output layer which yields the output of the NN. 

The input of neuron i in layer I is given by, 

i\ = 1 • i>. * =2, 3,- m 

3 

n = ^ , (8) 

where in[ is the set of kinematical variables for event e, the sum is extended over the neurons 
of the preceding layer (I — 1), S 1 ' 1 is the state of the neuron j, w\a is the connection weight 
between the neuron j and the neuron i, and B\ is a bias input to neuron i. The state of a 
neuron is a function of its input S*j = F(Ij), where F is the neuron response function. In this 
paper we take F(l\) = 1/(1 + exp(— /'•)), the so-called "sigmoid function", which is similar to 
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the response curve of the biological neuron and offers more sensitive modelling of real data 
than a linear function. 



The parallel behaviour of NNs has the capacity of learning over a set of given examples. 
A very popular learning algorithm is the error backpropagation (BP) [p^| . The main objective 
of the BP is to minimize an error function, also called energy 

E = E(inf\ ou&\ w kl , B n ) = \ E(o (e) - out^f , (9) 

by adjusting the and B n parameters and being o^ e ' the state of the output neuron, out^ 
its desired state, and e runs over the event sample. Taking the desired output as 1 for each 
signal event and for each background event, the output of the net, after training, gives the 
conditional probability that given the observed quantities for a single event, this event is a 
signal [[ilj, provided that the ratio of signal to background in the learning sample corresponds 



to the real one. 

We have used a 3 layer NN with 6 input neurons that are activated with the kinematical 
variables mentioned in the previous section (normalized to 1 for convenience), a hidden layer 
with 6 neurons, and a unique output neuron which desired output is 1 for the signal and 
for the background. We have found that using 6 neurons in the hidden layer optimizes the 
minimum energy. 

For the training step we have used 2000 top events and 2000 background events which 
do not correspond to the expected cross sections ratio. However since we are not interested 
in the conditional probability mentioned above but to study the efficiency and purity as a 
function of the cut on the output activation of the NN, this fact will not produce any trouble 
and the learning results more efficient. As a test sample, we have taken 570 (2000) top 
(background) events statistically independent from the training ones. The top/background 
ratio of the test sample is chosen equal to the obtained from the expected cross sections. All 
results presented have been obtained from the test sample. 

Figure 1 shows the distribution of signal and background events as a function of the 
NN output activation for the test sample. We see two peaks close to 1 and corresponding 
mainly to the signal and background respectively. It is clear from this plot that cutting on 
the output of the net we can have samples richer on signal or in background as desired. 

Solid (dashed) line in Figure 2 shows the efficiency (purity) as a function of the net 
output cut. It is clear that we have to choose an output cut close to 1 if we want high purity 
or a cut close to for high efficiency. The highest output cut to improve the purity, given a 
fixed luminosity, would be the one leading to still enough signal events (as minimum 5). This 
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cut will be very close to 1, due to the fact that the efficiency is larger than 0.9 for any value 
of the output cut except for values very close to 1 , 

Figure 3 shows the efficiency versus the purity (solid line) when varing the NN output 
cut from to 0.99998. The points correspond to some hypercubic cuts applied over the six 
input variables, and have to be considered as the traditional procedure (each point represents 
a given combination of cuts bigger than certain values, or masses located around a certain 
central value, for instance p t > p™ n 5 S > S mm , m w — 5 < rn Wjj < m w + 5 ,...), chosen favoring 
the signal in front of the background. We find that the NN performance, working only with 
one variable, the output of the net, is better than the traditional analysis for any combination 
of purity and efficiency, showing the great improvement of the method. A complex problem 
on many variables has been reduced to the study of only one variable, the NN output, which 
even improves the analysis. 

When the important fact is to reveal the existence of the signal the relevant quantity 
should be the statistical significance. Values of S s > 5 are commonly accepted as a proof 
of the existence of a clear signal. Figure 4 shows the relation of the statistical significance 
versus the efficiency and the purity for T t — 1 signal events (changing the number of signal 
events, T t , the surface in Fig. 4 does not modify its shape and only rescales its height which 
is proportional to \fT t ). Figure 5 shows the statistical significance as a function of the net 
output for 7 signal events before kinematical cuts (corresponding to an integrated luminosity 
of 20 pb _1 ). We see that the statistical significance increases as the output cut increases. As 
in the case for improving purity, the highest output cut, given a fixed luminosity, would be 
the one leading to still enough signal events (as minimum 5), and is very close to f. 

One of the problems that is faced in pp collisions is the estimation of the background. 
A factor 2 on the background could destroy any evidence of the signal. In Figure 6 we have 
the allowed region of the output cut versus the factor / of the background ( / = 2 means 
that the background is two times bigger as we have computed) where, for the luminosity of 
20 pb -1 , we still can obtain a 5 sigma effect with at least 5 signal events. Given a fixed 
factor / the largest and smallest values of the output cut correspond to the highest purity 
and highest efficiency respectively. Notice that output cuts very close to f are not included 
in the allowed region, although this is not visible in the plot. 

Our results indicate that NNs are suitable for top analysis at Tevatron. Although we 
focused our study in a particular channel and worked at the parton level, we expect similar 
behaviour for the other channels with the corresponding backgrounds and when performing 
more realistic analysis including hadronization and detector simulation. We do not claim 
to have used neither the best kinematical variables for our analysis, nor to find the best 
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NN topology. Our aim was only to study the potential use of NNs as a cross check to the 
traditional analysis in terms of cuts on a multidimensional variable space. More elaborated 
studies are postponed for a forthcoming publication. 

In conclusion, we have shown that a NN trained with a mixed sample of ti and back- 
ground events learns the main features of the different samples in a multivariable input space 
and projects them in a single output. This output turns out to be very useful for discrimina- 
tion between signal and background events. 

Acknowledgements. 
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Figure Captions 



Fig. 1 Distribution of the signal (dashed) and background (solid) events as a function of 
the NN output activation for the test sample consisting on 570 (2000) top (background) 
events. Values close to 1 (0) correspond mainly to top (background) events. 

Fig. 2 Efficiency (solid line) and purity (dashed line) as a function of the NN output cut. 

Fig. 3 Efficiency versus purity for the test sample. The solid line shows the NN result 
whereas the points correspond to several sets of linear cuts (see text) applied to the six 
input variables. 

Fig. 4 Statistical significance as a function of the efficiency and purity normalized to 
T t — 1 signal events. It scales as \fT t . 

Fig. 5 Statistical significance as a function of the NN output cut for an integrated lumi- 
nosity of 20 pb _1 . 

Fig. 6 Allowed region (shaded area) of the output cut versus the factor / of the back- 
ground (/ = 2 means that the background is two times bigger as we have estimated) 
where, for the luminosity of 20 pb -1 , we still can obtain a 5 sigma effect with at least 
5 signal events. 
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